Goto

Collaborating Authors

 final plan


Subgoal Graph-Augmented Planning for LLM-Guided Open-World Reinforcement Learning

arXiv.org Artificial Intelligence

Large language models (LLMs) offer strong high-level planning capabilities for reinforcement learning (RL) by decomposing tasks into subgoals. However, their practical utility is limited by poor planning-execution alignment, which reflects a critical gap between abstract plans and actionable, environment-compatible behaviors. This misalignment arises from two interrelated limitations: (1) LLMs often produce subgoals that are semantically plausible but infeasible or irrelevant in the target environment due to insufficient grounding in environment-specific knowledge, and (2) single-LLM planning conflates generation with self-verification, resulting in overconfident yet unreliable subgoals that frequently fail during execution. To address these challenges, we propose Subgoal Graph-Augmented Actor-Critic-Refiner (SGA-ACR), a framework that integrates an environment-specific subgoal graph and structured entity knowledge with a multi-LLM planning pipeline that explicitly separates generation, critique, and refinement to produce executable and verifiable subgoals. A subgoal tracker further monitors execution progress, provides auxiliary rewards, and adaptively updates the subgoal graph to maintain alignment between plans and actions. Experimental results on 22 diverse tasks in the open-world game "Crafter" demonstrate the effectiveness of our proposed method.


Revealing the Barriers of Language Agents in Planning

arXiv.org Artificial Intelligence

Autonomous planning has been an ongoing pursuit since the inception of artificial intelligence. Based on curated problem solvers, early planning agents could deliver precise solutions for specific tasks but lacked generalization. The emergence of large language models (LLMs) and their powerful reasoning capabilities has reignited interest in autonomous planning by automatically generating reasonable solutions for given tasks. However, prior research and our experiments show that current language agents still lack human-level planning abilities. Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks. This highlights a critical question: What hinders language agents from achieving human-level planning? Although existing studies have highlighted weak performance in agent planning, the deeper underlying issues and the mechanisms and limitations of the strategies proposed to address them remain insufficiently understood. In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. We also find that although current strategies help mitigate these challenges, they do not fully resolve them, indicating that agents still have a long way to go before reaching human-level intelligence.


Robots Are Learning Complex Tasks Just by Watching Humans Do Them

#artificialintelligence

Industrial robots used to be big, unwieldy, and dangerous, but new "human-safe" robots are now commonplace on automotive lines, working right next to people. Yet these robots are awkward coworkers; they coexist with us but do not meaningfully collaborate. Robots often need to be explicitly told how to be helpful or when to stay out of the way -- things human teammates seem to learn intuitively. A good human apprentice is a keen observer, inferring unspoken rules and customs, watching how others work, and then generalizing this knowledge for new situations. We are able to accomplish this partly because the human mind is able to process very complex information very efficiently.


Inferring Team Task Plans from Human Meetings: A Generative Modeling Approach with Logic-Based Prior

Journal of Artificial Intelligence Research

We aim to reduce the burden of programming and deploying autonomous systems to work in concert with people in time-critical domains such as military field operations and disaster response. Deployment plans for these operations are frequently negotiated on-the-fly by teams of human planners. A human operator then translates the agreed-upon plan into machine instructions for the robots. We present an algorithm that reduces this translation burden by inferring the final plan from a processed form of the human team's planning conversation. Our hybrid approach combines probabilistic generative modeling with logical plan validation used to compute a highly structured prior over possible plans, enabling us to overcome the challenge of performing inference over a large solution space with only a small amount of noisy data from the team planning session. We validate the algorithm through human subject experimentations and show that it is able to infer a human team's final plan with 86% accuracy on average. We also describe a robot demonstration in which two people plan and execute a first-response collaborative task with a PR2 robot. To the best of our knowledge, this is the first work to integrate a logical planning technique within a generative model to perform plan inference.


Inferring Robot Task Plans from Human Team Meetings: A Generative Modeling Approach with Logic-Based Prior

arXiv.org Machine Learning

We aim to reduce the burden of programming and deploying autonomous systems to work in concert with people in time-critical domains, such as military field operations and disaster response. Deployment plans for these operations are frequently negotiated on-the-fly by teams of human planners. A human operator then translates the agreed upon plan into machine instructions for the robots. We present an algorithm that reduces this translation burden by inferring the final plan from a processed form of the human team's planning conversation. Our approach combines probabilistic generative modeling with logical plan validation used to compute a highly structured prior over possible plans. This hybrid approach enables us to overcome the challenge of performing inference over the large solution space with only a small amount of noisy data from the team planning session. We validate the algorithm through human subject experimentation and show we are able to infer a human team's final plan with 83% accuracy on average. We also describe a robot demonstration in which two people plan and execute a first-response collaborative task with a PR2 robot. To the best of our knowledge, this is the first work that integrates a logical planning technique within a generative model to perform plan inference.